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Abstract 

Bayesian hypothesis testing is investigated when the prior probabilities of the hypotheses, taken as a random 
vector, are quantized. Nearest neighbor and centroid conditions are derived using mean Bayes risk error as a distortion 
measure for quantization. A high-resolution approximation to the distortion-rate function is also obtained. Human 
decision making in segregated populations is studied assuming Bayesian hypothesis testing with quantized priors. 
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I. Introduction 
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ONSIDER a hypothesis testing scenario in which an object is to be observed to determine which one of M 
states, {ho, . . . , h,M-i}, it is in. The object has prior probability p m of being in state h m , i.e. p m = Pr[H = 
h m ], and prior probability vector p = [po • • • Pm-i] > with Ylm=o P™> = -*-> which is known to the decision 
l— ~~ '■ maker. M-ary hypothesis testing with known prior probabilities calls for the Bayesian formulation to the problem, 
", for which the optimal decision rule minimizes Bayes risk [2]. 

Now consider the situation when there is a population of objects, each with its own prior probability vector drawn 
^ ■ from the distribution fp(p) supported on the (M — 1) -dimensional probability simplex. If the prior probability 
^ , vector of each object were known perfectly to the decision maker before observation and hypothesis testing, then 
the scenario would be no different than that of standard Bayesian hypothesis testing. However, we consider the case 
iT) • in which the decision maker is constrained and can only work with at most K different prior probability vectors. 
O , Such a constraint is motivated by scenarios where the decision maker has finite memory or limited information 
q \ processing resources. Hence, when there are more than K objects in the population, the decision maker must first 
■ map the true prior probability vector of the object being observed to one of the K available vectors and then 
• i-h , proceed to perform the optimal Bayesian hypothesis test, treating that vector as the prior probabilities of the object. 

Although not the only such constrained scenario, one example is that of human decision making. One particular 
C3 • setting is a referee deciding whether a player has committed a foul using his or her noisy observation as well as 
prior experience. Players commit fouls at different rates; some players are dirtier or more aggressive than others. 
It is this rate which is the prior probability for the 'foul committed' state. Hence, over the population of players, 
there is a distribution of prior probabilities. If the referee tunes the prior probability to the particular player on 
whose action the decision is to be made, decision-making performance is improved. 

Human decision makers, however, are limited in their information processing capacity and can only carry around 
seven, plus or minus two, categories without getting confused [3]. Consequently, the referee is limited and categorizes 
players into a small number of dirtiness levels, with associated representative prior probabilities, exactly the scenario 
described above. 

In this paper, the design of the mapping from prior probability vectors in the population to one of K representative 
probability vectors is approached as a quantization problem. Mean Bayes risk error (MB RE) is defined as a fidelity 
criterion for the quantization of fp(p) and conditions are derived for a minimum MBRE quantizer. Some examples 
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of MBRE-optimal quantizers are given along with their performance in the low-rate quantization regime. Distortion- 
rate functions are given for the high-rate quantization regime. Certain human decision-making tasks, as mentioned 
above, may be modeled by quantized prior hypothesis testing due to certain suboptimalities in human information 
processing. Human decision making is analyzed in detail for segregated populations, revealing a mathematical 
model of social discrimination. 

Previous work that combines detection and quantization looks at the quantization of observed data, not prior 
probabilities, and also only approximates the Bayes risk function instead of working with it directly, e.g. [4]-[6] 
and references cited in [6]. In such work, there is a communication constraint between the sensor and the decision 
maker, but the decision maker has unconstrained processing capability. Our work deals with the opposite case, 
where there is no communication constraint between the sensor and the decision maker, however the decision 
maker is constrained. 

A brief look at imperfect priors appears in [7, Sec. 2.E], but optimal quantization is not considered. In [8], [9], 
it is shown that small deviations from the true prior yield small deviations in the Bayes risk. We are not aware of 
any previous work that has looked at quantization, clustering, or categorization of prior probabilities. 

In the remainder of the paper, we focus on binary hypothesis testing, M = 2. Section [TT] defines the Bayes risk 
error distortion and gives some of its properties. Section|In]discusses low-rate quantization and Section ITVl discusses 
high-rate quantization. Some examples with a Gaussian measurement model are given in Section [V] Section [Vj 
considers the implications on human decision making and Section |VlIl provides a summary and directions for future 
work. 



II. Bayes Risk Error 

In the binary Bayesian hypothesis testing problem for a given object, there are two hypotheses ho and hi with 
prior probabilities po = Pr[J? = ho] and pi = Pi[H = hi] = 1 — po, a noisy observation Y, and likelihoods 
fY\H{y\ho) and /y|#(y|/ii)- Note that we consider a one-shot measurement Y, rather than a set of independent, 
noisy measurements. A function h(y) is designed that uniquely maps every possible y to either ho or hi in such a 
way that the function is optimal with respect to Bayes risk J = E[c(Hi, Hj)], an expectation over the non-negative 
cost function c(hi,hj). This gives the following specification for h(y): 

h(-) = & rgmmE[c(HJ(Y))], (1) 

where the expectation is over both H and Y. It may be shown that the optimal decision rule h(y) is the likelihood 
ratio test: 

fy\ H (y\hi) pq(c 10 - cqq) 

fv\H{y\ho) u^= ho C 1 ~ Po)(coi - en) ' 

where Cij = c(hi,hj). 

There are two types of errors, with the following probabilities: 

p E = Pv[h(Y) =hi\H = h ], 
p E = Pv[h(Y) =h \H = hi}. 

Bayes risk may be expressed in terms of those error probabilities as: 

J = (cio - c o)poPe + (coi - cn)(l - Po)pe + c oo?>o + cn(l - po). (3) 

It is often of interest to assign no cost to correct decisions, i.e. coo = c\i = 0, which we assume in the remainder 
of this paper. In this case, the Bayes risk simplifies to: 

J(po) = cioPoPe(po) + coi(! - Po)Pe(po)- (4) 

In (@]), the dependence of the Bayes risk and error probabilities on po has been explicitly noted. The error probabilities 
depend on po through /?,(•), given in ©. The function J(po) is zero at the points po = and po = 1 and is positive- 
valued, strictly concave, and continuous in the interval (0, 1) [2], [10], [11]. 
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In the case when the true prior probability is p$, but h(y) is designed according to © using some other value 
a substituted for po, there is mismatch, and the mismatched Bayes risk is: 

J(p , a) = c 10 p p E {a) + c i(l - Po)pi(a)- (5) 

J(po, a) is a linear function of po with slope (ciop l E (a) — cqiP%(o)) and intercept coip E (a). Note that J(po, a) is 
tangent to J(po) at a and that J(po,Po) = J(Po)- 

Definition 1: Let Bayes risk error d(po, a) be the difference between the mismatched Bayes risk function J(po, a) 
and the Bayes risk function J(po): 

d(p ,a) = J(po,a) - J(p ) 

= c w pop l E {a) + c i(l - Po)p E {a) - c w p p E {po) - c i(l - Po)p e {po)- (6) 
We now give properties of d(po, a) as a function of po and as a function of a. 

Theorem 1: The Bayes risk error d(po,a) is non-negative and only equal to zero when po = a. As a function 
of po G (0, 1), it is continuous and strictly convex for all a. 

Proof: Since J(po) is a continuous and strictly concave function, and lines J(po,a) are tangent to J(po), 
J(po,a) > J(po) for all po an d a, with equality when po = a- Consequently, d(po,a) is non-negative and only 
equal to zero when po = a. Moreover, d(po, a) is continuous and strictly convex in p$ £ (0, 1) for all a because it 
is the difference of a continuous linear function and a continuous strictly concave function. ■ 

Theorem 2: For any deterministic likelihood ratio test h(-), as a function of a e (0, 1) for all po, the Bayes risk 
error d(po,a) has exactly one stationary point, which is a minimum. 

Proof: Consider the parameterized curve (p l E ,p E ) traced out as a is varied; this is a flipped version of the 
receiver operating characteristic (ROC). The flipped ROC is a strictly convex function for deterministic likelihood 
ratio tests. At its endpoints, it takes values {p l E = 0,p E = 1) when a = 1 and (p l E = l,p E = 0) when a = [2], 
and therefore has average slope —1. By the mean value theorem and strict convexity, there exists a unique point 

dr> n dr> 11 

on the flipped ROC at which -ff- = —1. To the left of that point: — oo < -ff- < — 1, and to the right of that point: 

-1 < # < 0. 

PE I II 

For deterministic likelihood ratio tests, P ^^jf"^ < and 7 > for all a £ (0, 1) and positive constants f3 

and 7 [2]. Therefore, if gg < -1, i.e. < -1, then > and 0** +7$ > 0. In the same 

manner > if SI > - 1 ' then &k + < °- 

Combining the above, we find that the function (3p l E {a) + ^p E {a) has exactly one stationary point in (0,1), 
which occurs when the slope of the flipped ROC is — Denote this stationary point as a s . For < a < a s , 

— 1 < < and the slope of (3p l E (a) + jp E (a) is negative; for a s < a < 1, —00 < < —1 and the slope of 
j3p l E (a) + r yp E {a) is positive. Therefore, a s is a minimum. 

As a function of a, the Bayes risk error is of the form f3p E (a) + ^p E {a) + C. Hence, it also has exactly one 
stationary point a s , which is a minimum. ■ 

As seen in Section JIIJ the above properties of d(po,a) are useful to establish that the Lloyd-Max conditions are 
not only necessary, but also sufficient for quantizer local optimality. 

The third derivative of d(po,a) with respect to po i s: 

- cioPo— ^3 Jcio— ^ coiU - Po) dp 3 + ^c i dpg , (./) 

when the constituent derivatives exist. As seen in Section UVJ when the third derivative exists and is continuous, 
d(po,a) is locally quadratic, which is useful to develop high-rate quantization theory for Bayes risk error fidelity 
[12]. 



III. Low-Rate Quantization 

The conditions necessary for the optimality of a quantizer for fp a (Po) under Bayes risk error distortion are now 
derived. A K-point quantizer partitions the interval [0, 1] into K regions 1Z\, TI2, 7^3> ■ • • > T^K- For each of these 
quantization regions TZ^, there is a representation point to which elements are mapped. For regular quantizers, 
the regions are subintervals IZi = [0, b\], TI2 = ^2]* 7^3 = (^2> bz\, . . . , TZk = (bx-i, 1] and the representation 
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Fig. 1. The intersection of the lines J(po,at), tangent to J (pa) at at, and J(po, dk+i), tangent to J(po) at ak+i, is the optimal interval 
boundary. 



points afc are in A quantizer can be viewed as a nonlinear function vk{-) such that v k(po) = o-k f° r Po £ 7£fc- 
For a given K, we would like to find the quantizer that minimizes the MBRE: 

D = E[d(P ,v K (P ))] = [ d( P(h v K (p ))fp Q (p )dp . 



(8) 

There is no closed-form solution, but an optimal quantizer must satisfy the nearest neighbor condition, the centroid 
condition, and the zero probability of boundary condition [13]. The nearest neighbor and centroid conditions are 
developed for MBRE in the following subsections. When fp (po) is absolutely continuous, the zero probability of 
boundary condition is always satisfied. 

A. Nearest Neighbor Condition 

With the representation points {a k } fixed, an expression for the interval boundaries is derived. Given any 
Po ^ [flfcjOfc+i], if J(po,ak) < J(po, flfe+i) then Bayes risk error is minimized if po is represented by a k , and 
if J(po,ak) > J(po,ak+i) then Bayes risk error is minimized if po is represented by a k +x- The boundary point 
b k € [afc,afc + i] is the abscissa of the point at which the lines J(po,Ofe) an d J{Po^k+x) intersect. The idea is 
illustrated graphically in Fig. [T] 

By manipulating the slopes and intercepts of J(po, Ofc) and J(po, a^+i), the point of intersection is found to be: 

, cox (p'IK+i) 



c oi {Psiak+i) -Pl(afc)) - cxo {p l E (a k+1 ) - p l E (a k )) 



(9) 



B. Centroid Condition 

With the quantization regions fixed, the MBRE is to be minimized over the {a k }- Here, the MBRE is expressed 
as the sum of integrals over quantization regions: 

K 

D = Y,i (j(Po,ak)-J(po)) fp (po)dpo- (10) 



k=X 



Because the regions are fixed, the minimization may be performed for each interval separately. 

Let us define P k = j nk Pofp {Po)dpo an d l\ l = f n (1 — Po)fp (Po)dpo, which are conditional means. Then: 



argmin {c 10 I l k p E {a) + c m I k l p E {a)} . 



(11) 



Since (ip E (a) +^p l E {a) has exactly one stationary point, which is a minimum (cf. Theorem|2]), equation ([111) is 
uniquely minimized by setting its derivative equal to zero. Thus, a k is the solution to: 

rl dp l E (a) 



at 



, rn dpl(a) 



0. 



(12) 



'Due to the strict convexity of d{po,a) in po for all a shown in Theorem [TJ quantizers that satisfy the necessary conditions for MBRE 
optimality are regular, see [13, Lemma 6.2.1]. Therefore, only regular quantizers are considered. 
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Commonly, differentiation of the two error probabilities is tractable; they are themselves integrals of the likelihood 
functions and the differentiation is with respect to some function of the limits of integration. 

C. Lloyd-Max Algorithm 

Alternating between the nearest neighbor and centroid conditions, the iterative Lloyd-Max algorithm can be 
applied to find minimum MB RE quantizers [13]. The algorithm is widely used because of its simplicity, effectiveness, 
and convergence properties [14]. 

In [15], it is shown that the conditions necessary for optimality of the quantizer are also sufficient conditions 
for local optimalitjH if the following hold. The first condition is that fp (po) must be positive and continuous in 
(0, 1). The second condition is that f d(po,a)fp (po)dpo must be finite for all a. The first and second conditions 
are met by common distributions such as the beta distribution [16]. 

The third condition is that the distortion function d(po, a) must satisfy some properties. It must be zero only for 
Po = a, continuous in po for all a, and convex in a; the first two of these hold as discussed in Theorem [TJ The 
third, convexity in a, does not hold for Bayes risk error in general, but the convexity of d(po,a) in a is only used 
by [15] to show that a unique minimum exists. As shown in Theorem |2j d(po, a) has a unique stationary point that 
is a minimum. Therefore, the analysis of [15] applies to Bayes risk error distortion. Thus, if fp (Po) satisfies the 
first and second conditions, then the algorithm is guaranteed to converge to a local optimum. The algorithm may 
be run many times with different initializations to find the global optimum. 

Further conditions on d(po,a) and fp (po) are given in [15] for there to be a unique locally optimal quantizer, 
i.e. the global optimum. If these further conditions for unique local optimality hold, then the algorithm is guaranteed 
to find the globally minimum MBRE quantizer. 

In many practical situations, the distribution fp (Po) is not available, but data drawn from it is available. The 
optimal design of quantizers from data is NP-hard [17], [18]. However, the Lloyd-Max algorithm and its close 
cousin if -means can be used on data with the Bayes risk error fidelity criterion. In fact, as the size of the dataset 
increases, the sequence of quantizers designed from data converges to the quantizer designed from fp (po) [19], 
[20]. (Conditions on the distortion function given in [20] except convexity in a are met by the Bayes risk error, 
but in a similar way to the sufficiency of the Lloyd-Max conditions, the unique minimum property of the Bayes 
risk error is enough.) 

D. Monotonic Convergence in if 

Let D*(K) = Y^k=iJn* ^(.Po^ a k)fPo(Po)dpo denote the MBRE for an optimal if -point quantizer. We show 
that D* (if) monotonically converges as if increases. The MBRE-optimal if -point quantizer is the solution to the 
following problem: 

K rb k 

minimize / d(p ,a k )f Po (po)dp 

k=l Jbk ~ 1 
such that bo = 

b K = 1 

h-i <a k , k = l,...,K 

a-k < h, k = l,...,K. (13) 

Let us add the additional constraint bx-i = 1 to ([T3l . forcing ax = 1 and degeneracy of the if th quantization 
region. The optimization problem for the if -point quantizer ([TBI with the additional constraint is equivalent to the 
optimization problem for the (if — l)-point quantizer. Thus, the (if — l)-point design problem and the if -point design 
problem have the same objective function, but the (if— l)-point problem has an additional constraint. Therefore, 

D*(K - 1) > D*{K). 

Since d(po,VK{po)) > 0, D = E[d(Po, vk(Pq))] > 0. Since the sequence D*(K) is nonincreasing and bounded 
from below, it converges. Mean Bayes risk error cannot get worse when more quantization levels are employed. 

2 By local optimality, it is meant that the {a^} and {b^} minimize the objective function l[8j among feasible representation and boundary 
points near them. 
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In typical settings, as in Section |VJ performance always improves with an increase in the number of quantization 
levels. 

IV. High-Rate Quantization 

Let us apply high-rate quantization theory [14] to the study of minimum MBRE quantization. The distortion 
function for the MBRE criterion has a positive second derivative in po (due to strict convexity) and for many 
families of likelihood functions, it has a continuous third derivative, see (0. Thus, it is locally quadratic in the 
sense of Li et al. [12] and in a manner similar to many perceptual, non-difference distortion functions, the high-rate 
quantization theory is well-developed. 

At high rate, i.e. K large, if we let: 

B(p ) = -hioPo^ ~ c 10 ^ - W(l " P°)^ + coi^, d4) 
then d{po,a k ) is approximated by the following second order Taylor expansion: 

d(po,a k ) rs B(p )\ po=ak (p - a k f , Po G lZ k . (15) 

Assuming that fp (-) is sufficiently smooth and substituting ( fl"5T ) into the objective of ( fT3l ), the MBRE is approxi- 
mated by: 

K f 

D » V] fp ( a k)B(a k ) / {po - a k f dp . (16) 
k=i J K k 

The MBRE is greater than and approximately equal to the following lower bound, derived in [12] by relationships 
involving normalized moments of inertia of intervals lZ k and by Holder's inequality: 

D L = mr \ B (po)fp (po)*(po)~ 2d Po, (17) 

Jo 

where the optimal quantizer point density is: 

Jo {B(po)fp (po)) * dpo 

Integrating a quantizer point density over an interval yields the fraction of the {a k } that are in that interval. 
Substituting £ij]> into £[7]) yields: 

D L = j^\\B{po)fp {po)\\i/3- (19) 
V. Examples 

As an example, let us consider the following scalar signal and measurement model: 

Y = s m + W, m € {0, 1}, (20) 

where so = and si = p, (a known, deterministic quantity), and W is a zero-mean, Gaussian random variable with 
variance a 2 . The likelihoods are: 

f Y \ H {y\ho) = M{y,0,a 2 ) = ^\ 

f Y \ H {y\hi)=N{y;p,a 2 ) = -j= e -^ 2 / 2 * 2 . (21) 

The two error probabilities are: 

PS(P0) = 0(^-^(5^)), (22) 

roc 

Q{a) = -j= / e- x2 l 2 dx. 



where: 
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Finding the centroid condition, the derivatives of the error probabilities are: 



<V E (Po) 



dp 

dp u E (pa) 
dp 



1 



a i. 



1 a 

2tt fi a k (l-a k ) 



0)" 



t i. i < -;(JiHsSs)) i 

v^F^ afc(l-a fc ) 



(23) 
(24) 



By substituting these derivatives into (fT2l and simplifying, the following expression is obtained for the representation 
points: 



A 



For high-rate analysis, the second derivatives of the error probabilities are needed. They are: 



and: 



Wo 



2 pl(po) 



1 



1 



8tt MPo^-Po) 1 



dps 



+ 



8vr MPo(l-Po) : 



1 „2 +2ct 2 l n c iopo 



1 „2_ 2ct 2 ln ^10P0 

.g Sm 2 ^ 2 Wi(i-po) 



))" 
)) ; 



-3 + 4p 



-l + 4p 



^4 In 

p 



p- 



CioPa 
coi(l-po) 



CioPo 
Cm(l-Po) 



(25) 



(26) 



(27) 



By inspection, we note that the third derivatives are continuous. Substituting the first derivatives (I23l)-d24l) and 
second derivatives (T26l)-(l2"7T) into (fT4l) . an expression for B(po) can be obtained. 

Examples with different distributions fp (Po) are presented below. All of the examples use scalar signals with 
additive Gaussian noise, \i = 1, a = 1 d20l ). As a point of reference, a comparison is made to quantizers designed 
under mean absolute error (MAE) [21], i.e. d(po,a) = \po — a\, an objective that does not account for hypothesis 
testing^ 

In the high-rate comparisons, the optimal point density for MAE [23]: 

X(v ) = fp " iP ° )1/2 

is substituted into the high-rate distortion approximation for the MB RE criterion (fTTT ). Taking R = \og 2 {K), there 
is a constant gap between the rates using the MBRE point density and the MAE point density for all distortion 
values. This difference is: 

||M(po)J3(po)||i/3 \ 



-Rmbre(-Dl) — Rmae(Dl) 



log 2 



Pb(Po)l|i/2 So B(p )dp 0/ 



The closer the ratio inside the logarithm is to one, the closer the MBRE- and MAE-optimal quantizers. 



A. Uniformly Distributed Pq 

We first look at the setting in which all prior probabilities are equally likely. The MBRE of the MBRE-optimal 
quantizer and a quantizer designed to minimize MAE with respect to fp a (po) are plotted in Fig. [2] (The optimal 
MAE quantizer for the uniform distribution is the uniform quantizer.) The plot shows MBRE as a function of K; 
the solid line with circle markers is the MBRE-optimal quantizer and the dotted line with asterisk markers is the 
MAE-optimal quantizer. Dl, the high-rate approximation to the distortion-rate function is plotted in Fig. [3] 

The performance of both quantizers is similar, but the MBRE-optimal quantizer always performs better or equally. 
For K = 1,2, the two quantizers are identical, as seen in Fig. @H-b. The plots in Fig. |4] show J(po, vk{po)) as solid 
and dotted lines for the MBRE- and MAE-optimal quantizers respectively; the markers are the representation points. 
The gray line is J(po), the Bayes risk with unquantized prior probabilities. For K = 3, 4, the representation points 
for the MBRE-optimal quantizer are closer to po = ^ than the uniform quantizer. This is because the area under 
the point density function A(po) shown in Fig. [5] is concentrated in the center. Each increment of K is associated 

3 As shown by Kassam [21], minimizing the MAE criterion also minimizes the absolute distance between the cumulative distribution function 
of the source and the induced cumulative distribution function of the quantized output. Since the induced distribution from quantization is used 
as the population prior distribution for hypothesis testing, requiring this induced distribution to be close to the true unquantized distribution 
is reasonable. If distance between probability distributions is to be minimized according to the Kullback-Leibler discrimination between 
the true and induced distributions (which is defined in terms of likelihood ratios), an application of Pinsker's inequality shows that a small 
absolute difference is requisite [22]. Although a reasonable criterion, MAE is suboptimal for hypothesis testing performance as seen in the 
examples. 
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Fig. 2. MBRE for uniformly distributed Pq and Bayes costs cio = coi = 1 plotted on a logarithmic scale as a function of the number 
of quantization levels K; the solid line with circle markers is the MBRE-optimal quantizer and the dotted line with asterisk markers is the 
MAE-optimal uniform quantizer. 



Q10' 




Fig. 3. High-rate approximation of distortion-rate function Dl for uniformly distributed Pq and Bayes costs cio = coi = 1; the solid line 
is the MBRE-optimal quantizer and the dotted line is the MAE-optimal uniform quantizer. 
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(a) 





(c) 



(d) 



Fig. 4. Quantizers for uniformly distributed Pq and Bayes costs cio = coi = 1. J(po,vk(po)) is plotted for (a) K — 1, (b) K = 2, 
(c) K = 3, and (d) K = 4; the markers, circle and asterisk for the MBRE-optimal and MAE-optimal quantizers respectively, are the 
representation points {a k }. The gray line is the unquantized Bayes risk J(po)- 
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Fig. 5. Optimal MB RE point density for uniformly distributed Po and Bayes costs cio = coi = 1. 
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Fig. 6. MBRE for uniformly distributed Po and Bayes costs cio = 1, coi = 4 plotted on a logarithmic scale as a function of the number 
of quantization levels K; the solid line with circle markers is the MBRE-optimal quantizer and the dotted line with asterisk markers is the 
MAE-optimal uniform quantizer. 



with a large reduction in Bayes risk. There is a very large performance improvement from K = 1 to K = 2. 

In Fig. [6l Fig. |7J Fig. [U and Fig. |9j similar plots to those above are given for the case when the Bayes costs cio 
and coi are unequal. The unequal costs skew the Bayes risk function and consequently the representation point 
locations and point density function. The difference in performance between the MBRE-optimal and MAE-optimal 
quantizers is greater in this example because the MAE-criterion cannot incorporate the Bayes costs, which factor 
into MBRE calculation. 




(bits) 



Fig. 7. High-rate approximation of distortion-rate function Dl for uniformly distributed Po and Bayes costs cio = 1, coi = 4; the solid 
line is the MBRE-optimal quantizer and the dotted line is the MAE-optimal uniform quantizer. 
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(a) 





(c) 



(d) 



Fig. 8. Quantizers for uniformly distributed Pq and Bayes costs cio = l,coi = 4. J(po, uk(po)) is plotted for (a) K — 1, (b) K = 2, 
(c) if = 3, and (d) K = 4; the markers, circle and asterisk for the MBRE-optimal and MAE-optimal quantizers respectively, are the 
representation points {a^}. The gray line is the unquantized Bayes risk J(po). 




0.4 0.6 

Po 



Fig. 9. Optimal MB RE point density for uniformly distributed Pq and Bayes costs cio = 1, coi = 4. 



B. Ztefa Distributed Po 

Now, we look at a non-uniform distribution for Pq, in particular the Beta(5, 2) distribution. The probability density 
function is shown in Fig. [TO] The MBRE of the MBRE-optimal and MAE-optimal quantizers are in Fig. Qj] Here, 
there are also large improvements in performance with an increase in K. The high-rate approximation to the 
distortion-rate function for this example is given in Fig. [T2l 

The representation points {a^} are most densely distributed where X(po), plotted in Fig. |T3l has mass. In particular, 
more representation points are in the right half of the domain than in the left, as seen in Fig. [141 




Fig. 10. The probability density function fp (po) for the Beta(5, 2) distribution. 
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Fig. 11. MBRE for Beta(5, 2) distributed Po and Bayes costs cio = coi = 1 plotted on a logarithmic scale as a function of the number 
of quantization levels K\ the solid line with circle markers is the MBRE-optimal quantizer and the dotted line with asterisk markers is the 
MAE-optimal uniform quantizer. 
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Fig. 12. High-rate approximation of distortion-rate function Dl for Beta(5, 2) distributed Po and Bayes costs cio = coi = 1; the solid 
line is the MBRE-optimal quantizer and the dotted line is the MAE-optimal uniform quantizer. 



VI. Implications on Human Decision Making 

In the previous sections, we formulated the minimum MBRE quantization problem and discussed how to find 
the optimal MBRE quantizer. Having established the mathematical foundations of hypothesis testing with quantized 
priors, we may explore the implications of such resource-constrained decision making on human affairs. 

Let us consider the particular setting for human decision making mentioned in Section U a referee determining 
whether a player has committed a foul or not using both his or her noisy observation and prior experience. The 
fraction of plays in which a player commits a foul is that player's prior probability for hi. Over the population 
of players, there is a distribution of prior probabilities. Also as mentioned in Section Jl human decision makers 
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Po 

Fig. 13. Optimal MBRE point density for Beta(5, 2) distributed Po and Bayes costs cio = coi = 1. 
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Fig. 14. Quantizers for Beta(5, 2) distributed Pq and Bayes costs cio = l,coi = 4. J(po, vk(po)) is plotted for (a) K — 1, (b) K — 2, 
(c) K = 3, and (d) A" = 4; the markers, circle and asterisk for the MBRE-optimal and MAE-optimal quantizers respectively, are the 
representation points {a^}. The gray line is the unquantized Bayes risk J(po). 

categorize into a small number of categories due to limitations in information processing capacity [3]. Decisions by 
humans may be modeled via quantization of the distribution of prior probabilities and the use of the quantization 
level centroid of the category in which a player falls as the prior probability when performing hypothesis testing 
on that player's action. 

Therefore, a referee will do a better job with more categories rather than fewer. A police officer confronting 
an individual with whom he or she has prior experience will make a better decision if he or she has the mental 
categories 'probably violent,' 'possibly violent or nonviolent,' and 'probably nonviolent,' versus just 'violent' and 
'nonviolent.' Similarly, a doctor will have a smaller probability of error when interpreting a blood test if he or she 
knows the prior probability of the test turning out positive for many categorizations of patients rather than just 
one for the entire population at large. Additional examples could be given for a variety of decision-making tasks. 
Implications of this sort are not surprising. However, when one additional component is added to the decision- 
making scenario, some fairly interesting implications arise. Next, we look at the case when the quantization of two 
distinct populations is done separately. 

We discuss mathematically unavoidable consequences of quantized prior hypothesis testing when quantizing the 
prior probability for a minority population and the prior probability for a majority population separately, while taking 
identical prior probability distributions of the two populations fp (po). Although majority and minority populations 
can be defined along any socially observable dimension, such as gender or age [24], for ease of exposition we use 
race, and more specifically use 'white' and 'black' to denote the two populations. Although there is some debate 
in the social cognition literature [25], it is thought that race and gender categorization is essentially automatic, 
particularly when a human actor lacks the motivation, time, or cognitive capacity to think deeply. 

We can extend the definition of MBRE to two populations as: 

= ^- b E[J(P ,v Kw (P ))] + ^ b E[J{P ,v Kb {P ))\ - E[J(P )], (28) 

where w is the number of whites encountered, b is the number of blacks encountered^ K w is the number of points 
in the quantizer for whites, and Kf, is the number of points in the quantizer for blacks. In order to find the optimal 
allocation of the total quota of representation points K t = K w + Kj,, we minimize for all K t - 1 possible 
allocations and choose the best one; more sophisticated algorithms developed for bit allocation to subbands in 
transform coding may also be used [27]. 

Fryer and Jackson have previously suggested that it is better to allocate more representation points to the majority 
population than to the minority population [28]. With two separate scalar quantizers, but a single size constraint, 

4 One might assume that w and b are simply the number of whites and blacks in the general population, however these numbers should 
actually be based on the social interaction pattern of the decision maker. Due to segregation in social interaction, see e.g. [26] and references 
therein, there is greater intra-population interaction than inter-population interaction. The decision maker has more training data from intra- 
population interaction. 
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optimizing over vk w (-) and vx b {') yields the same result. Due to the monotonicity result in Sec. IIII-Dl the 
MBRE for members of the minority group is greater than that for the majority group. 

Assuming white decision makers have w > b and black decision makers have b > w, analysis of quantized prior 
Bayesian hypothesis testing predicts that there should be own-race bias in decision making. This prediction is in 
fact born out experimentally. A large body of literature in face recognition shows exactly the predicted own race 
bias effect, observed colloquially as "they [other-race persons] all look alike." In particular, both parts of the Bayes 
risk, p l E and p E increase when trying to recognize members of the opposite population [29]. Verification of own 
race bias in face recognition is due to laboratory experimentation, however similar effects have also been observed 
in natural experiments through econometric studies. 

It has been found that the addition of police officers of a given race is associated with an increase in the number 
of arrests of suspects of a different race but has little impact on same-race arrests. The effect is more pronounced 
for minor offenses where the prior probability presumably plays a bigger role than the measurement [30]. There 
are similar own-race bias effects in the decision by police to search a vehicle during a traffic stop [31], in the 
decision of human resource professionals to not hire [32], and in the decision of National Basketball Association 
(NBA) referees to call a foul [33]. The rate of searching, the rate of not hiring, and the rate of foul calling are all 
greater when the decision-maker is of a different race than the driver, applicant, and player, respectively. A major 
difficulty in interpreting these econometric studies, however, is that the ground truth is not known. Higher rates 
may be explained by either greater p l E or smaller p E . 

Since ground truth is lacking in econometric studies, it is not clear how to interpret a finding that white referees 
call more fouls on black players and that black referees call more fouls on white players. This phenomenon cannot 
simply be explained by a larger probability of decision error. The Bayes risk must be teased apart into its constituent 
parts and the Bayes costs must be examined in detail. 

The measurable quantity in an econometrics study is the probability that a foul is called: 



Looking at the average performance of a white referee over the populations of black and white players, we compare 
the expected foul rates on whites and blacks (Kt < K w ): 



If this discrimination quantity A is greater than zero, then the white referee is calling more fouls on blacks. If A 
is less than zero, then the referee is calling more fouls on whites. The A expression may be written as: 

A(cio,c i) = E[p p E (v Kb (Po)) ~ (1 ~ Po)Pe( v k„(po))} 



The dependence of A on cio and coi is explicit on the left side of ( f3Tb and is implicit in the error probabilities on 
the right side. The value of A also depends on the unquantized prior distribution fp (po), the measurement model, 
and the quantizer. 

If the prior distribution and measurement model are fixed, and the MBRE-optimal quantizer used, we find that 
the regions in the cio-coi plane where a white referee would call more fouls on blacks and where a white referee 
would call more fouls on whites are half -planes. For the uniform prior fp (po), the dividing line between the two 
regions is exactly coi = cio- For the Beta(5,2) prior, the dividing line is coi = mcio, where m > 1. 

Using the division of the cio-coi plane into two parts, we can now interpret the econometric findings in the 
NBA referee study [33] and related results [30]-[32]. The NBA race bias observations can be generated from the 
quantized prior hypothesis testing model only if the Bayes risk error has costs coi > cio for a uniform prior or 
costs coi S> cio for a Beta(5,2) prior. The choice of Bayes costs with coi greater than cio implies that a referee 
can tolerate more instances of calling fouls on plays that are not fouls rather than the opposite. This assignment of 
costs has been called the precautionary principle in some contexts. Very simply, the precautionary principle states 
"better safe than sorry." 

Taken together, the hypothesis testing with quantized priors model, the phenomenon of racial segregation [26], and 
results from econometric studies [30]-[33] suggest that referees, police officers, and human resources professionals 
all follow the precautionary principle. 



Pr[H K = h 1 ] = l -Po+PoPe(vk(po)) ~ (1 ~ Po)Pe( v k(po))- 



(29) 



A = E Px[H Ki = hi] - Pr[H Kw = hi] 



(30) 



- E\p p E (v Kw {p )) - (1 - Po)p e {vk w (po))]- 



(31) 
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VII. Conclusion and Future Work 

We have looked at Bayesian hypothesis testing when there is a distribution of prior probabilities, but the decision 
maker may only use a quantized version of the true prior probability in designing a decision rule. Considering the 
problem of finding the optimal quantizer for this purpose, we have defined a new fidelity criterion based on the 
Bayes risk function. For this criterion, MBRE, we have determined the conditions that an optimal quantizer satisfies 
and worked through a high-rate approximation to the distortion. M-ary hypothesis testing with M > 2 requires 
vector quantization rather than scalar quantization, but determining the Lloyd-Max conditions and high-rate theory 
is no different conceptually due to the geometry of the Bayes risk function and mismatched Bayes risk function. 
For the M-ary hypothesis testing case, a multivariate distribution such as the M-dimensional Dirichlet distribution 
[16] is needed for fp(p). Previous, though significantly different, work on quantization for hypothesis testing was 
unable to directly minimize the Bayes risk, as was accomplished in this work. 

The mathematical theory of quantized prior hypothesis testing formulated here leads to a generative model of 
discriminative behavior when combined with theories of social cognition and empirical facts about social segregation. 
This biased decision making arises despite having identical distributions for different populations and despite no 
malicious intent on the part of the decision maker. We also discussed how the choice of Bayes costs affects detection 
probabilities; in particular, the precautionary principle leads to a higher detection probability for the opposite race, 
whereas a more optimistic view leads to a higher detection probability for the own race. Such a phenomenon 
of pessimistic or optimistic attitude fundamentally altering the nature of discrimination seems not to have been 
described before. Discrimination on the basis of race, gender, and other socially observable characteristics has been 
a troublesome social problem, but appears to be a permanent artifact of the automaticity of classification and the 
finite human capacity for information processing. 

There are many avenues along which to extend this work, such as dealing with decentralized detection and 
classification (with possible implications on jury decisions and elections), which may become game theoretic; 
consideration of additional noise before or after quantization of the prior probabilities; or the development of 
successively refinable quantizers (for decision makers that possess a memory hierarchy). One can also consider 
a restricted class of quantizers rather than considering optimal quantization. Such restriction may model further 
cognitive constraints on human decision makers. In particular, Fryer and Jackson have suggested a heuristic algorithm 
for quantizer design based on splitting groups [28], which is a rediscovery of the tree-structured vector quantizer 
(TSVQ) design algorithm given in [34, Fig. 20]. Beyond [34], there has been much recent development in the 
theory of TSVQ performance and recursive partitioning, which may prove useful. 

For the quantizer with K = 1, an alternative to the MBRE-optimal representation point: 

a MBRE = arg min | ^ J (p, o)/j>(p)dp| 

is the min-max hypothesis testing representation point: 

ttmin-max = arg mm I max J (p, a) j , 

which is only equivalent in special cases. A distribution on the prior probabilities is needed to specify o^bre' 
but not to specify a * in . m ax . One may consider extending the min-max idea to K > 1. This would involve an 
approach related to e-entropy [35, Sec. 6.1.2] and finding a cover for the unit simplex by K sets of the form 
'R-k = {p\J{P: a k) < D}, where all p in TZ^ map to and D is the same for all H^. 

The general theme of machine learning for the explicit purpose of hypothesis testing, within which this work falls, 
is receiving increasing attention; framing the hypothesis testing scenario discussed here in terms of probabilistic 
graphical models of categorization, e.g. the latent Dirichlet allocation model [36] and the hierarchical Dirichlet 
process mixture model [37], may prove insightful as well. 
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